Compiler-Enhanced Incremental Checkpointing

نویسندگان

  • Greg Bronevetsky
  • Daniel Marques
  • Keshav Pingali
  • Radu Rugina
چکیده

As modern supercomputing systems reach the peta-flop performance range, they grow in both size and complexity. This makes them increasingly vulnerable to failures from a variety of causes. Checkpointing is a popular technique for tolerating such failures in that it allows applications to periodically save their state and restart the computation after a failure. Although a variety of automated system-level checkpointing solutions are currently available to HPC users, manual application-level checkpointing remains by far the most popular approach because of its superior performance. This paper focuses on improving the performance of automated checkpointing via a compiler analysis for incremental checkpointing. This analysis is shown to significantly reduce checkpoint sizes (upto 78%) and to enable asynchronous checkpointing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Enhanced MSS-based checkpointing Scheme for Mobile Computing Environment

Mobile computing systems are made up of different components among which Mobile Support Stations (MSSs) play a key role. This paper proposes an efficient MSS-based non-blocking coordinated checkpointing scheme for mobile computing environment. In the scheme suggested nearly all aspects of checkpointing and their related overheads are forwarded to the MSSs and as a result the workload of Mobile ...

متن کامل

A Compiler-based Approach to Fault-Tolerance in Real-Time Systems

Real-time systems are characterized as systems in which the correctness of the system depends both on agreement of the result of the computation with the intended semantics, and on the timeliness with which the results are produced. In real-time systems, there are time-critical events which must occur within speci-ed timeframes or by given deadlines. Missing a deadline may result in a problem l...

متن کامل

Taking Point Decision Mechanism for Page-level Incremental Checkpointing based on Cost Analysis of Process Execution Time

Incremental checkpointing, which is intended to minimize checkpointing overhead, saves only the modified pages of a process. This means that in incremental checkpointing, the time consumed for checkpointing varies according to the amount of modified pages. Thus, efficient intervals of checkpointing have to be determined on run-time of a process. In this paper, we present an efficient and adapti...

متن کامل

Compiler-assisted Full Checkpointing

This paper describes a compiler-based approach to checkpointing for process recovery. The implementation is transparent to both the programmer and the hardware. The compiler-generated sparse potential checkpoint code maintains the desired checkpoint interval. Adaptive checkpointing reduces the size of the checkpoints. Training is used to select low-cost, high-coverage potential checkpoints. The...

متن کامل

Compiler-Assisted Checkpointing

In this paper we present compiler-assisted checkpointing, a new technique which uses static program analysis to optimize the performance of checkpointing. We achieve this performance gain using libckpt, a checkpointing library which implements memory exclusion in the context of user-directed checkpointing. The correctness of user-directed checkpointing is dependent on program analysis and inser...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007